SIFT-based local spectrogram image descriptor: a novel feature for robust music identification
نویسندگان
چکیده
Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that these sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In addition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching determination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient and precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of corresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have experienced can be accurately estimated.
منابع مشابه
DPML-Risk: An Efficient Algorithm for Image Registration
Targets and objects registration and tracking in a sequence of images play an important role in various areas. One of the methods in image registration is feature-based algorithm which is accomplished in two steps. The first step includes finding features of sensed and reference images. In this step, a scale space is used to reduce the sensitivity of detected features to the scale changes. Afterw...
متن کاملLocal gradient pattern - A novel feature representation for facial expression recognition
Many researchers adopt Local Binary Pattern for pattern analysis. However, the long histogram created by Local Binary Pattern is not suitable for large-scale facial database. This paper presents a simple facial pattern descriptor for facial expression recognition. Local pattern is computed based on local gradient flow from one side to another side through the center pixel in a 3x3 pixels region...
متن کاملLocal Image Descriptor using VQ-SIFT for Image Retrieval
In this paper, we present local image descriptor using VQ-SIFT for more effective and efficient image retrieval. Instead of SIFT's weighted orientation histograms, we apply vector quantization (VQ) histogram as an alternate representation for SIFT features. Experimental results show that SIFT features using VQ-based local descriptors can achieve better image retrieval accuracy than the conventi...
متن کاملA novel Local feature descriptor using the Mercator projection for 3D object recognition
Point cloud processing is a rapidly growing research area of computer vision. Introducing of cheap range sensors has made a great interest in the point cloud processing and 3D object recognition. 3D object recognition methods can be divided into two categories: global and local feature-based methods. Global features describe the entire model shape whereas local features encode the neighborhood ...
متن کاملEfficient music identification using ORB descriptors of the spectrogram image
Audio fingerprinting has been an active research field typically used for music identification. Robust audio fingerprinting technology is used to successfully perform content-based audio identification regardless of the audio signal being subjected to various types of distortion. These distortions affect the time-frequency correlation relating to pitch and speed changes. In this paper, experime...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Audio, Speech and Music Processing
دوره 2015 شماره
صفحات -
تاریخ انتشار 2015